Model-driven testing
- 2025-08-23
Alloy와 단위 테스트를 조화롭게 연결하는 프로세스.
AI가 생성한 콘텐츠가 포함됨. 아래는 모두 AI가 생성한 글이다. Gemini 2.5 Pro와 대화한 내용을 바탕으로 만든 초안 Claude Opus 4.1에게 제공하고 “extended thinking” 모드를 켠 상태에서 개선된 버전을 생성하도록 했다.
Executive Summary
This manual presents a systematic approach to using Alloy as a test generation engine for Test-Driven Development. By treating Alloy models as executable specifications, we can automatically derive comprehensive test suites that catch bugs before they’re written.
Core Workflow: Model → Analyze → Generate Tests → Code → Validate → Refine
Key Principle: Every Alloy construct maps to specific types of tests. The model doesn’t just verify your design—it actively generates the test cases most likely to break your system.
Part 1: The Alloy Modeler’s Guide (The Architect)
Your role is to create a logically sound, verifiable specification that serves as both a design document and a test generation engine. Think of yourself as writing executable requirements.
1.1 Model the System’s Core Structure
Signatures and Fields define your system’s vocabulary and relationships:
sig User {
posts: set Post,
following: set User
}
sig Post {
author: one User,
comments: set Comment,
timestamp: one Time
}
sig Comment {
author: one User,
content: one String
}
Deliverable for Testing: Each sig becomes a test class, each field becomes a property test.
1.2 Define Invariants with Facts
Facts are non-negotiable rules that must always hold:
fact NoSelfFollowing {
no u: User | u in u.following
}
fact ConsistentOwnership {
all p: Post | p in p.author.posts
}
fact NoOrphanedComments {
all c: Comment | some p: Post | c in p.comments
}
Deliverable for Testing: Each fact generates negative tests that attempt violations.
1.3 Create Operations with Predicates
Predicates model operations and state transitions—your system’s behavior:
pred addComment[pre, post: State, u: User, p: Post, c: Comment] {
// Preconditions
p in pre.posts
c not in pre.allComments
// Postconditions
post.allComments = pre.allComments + c
post.posts = pre.posts
p.comments' = p.comments + c
c.author' = u
// Frame conditions (what doesn't change)
all p2: Post - p | p2.comments' = p2.comments
}
Deliverable for Testing: Each predicate generates parameterized test scenarios.
1.4 Explore Valid States with Run Commands
Run commands generate concrete, valid instances:
run ShowActiveDiscussion {
some p: Post | #p.comments > 3
some u: User | #u.following > 2
} for 5
Deliverable for Testing: Each instance becomes a “happy path” test with specific data.
1.5 Verify Properties with Assertions
Assertions are properties you believe must be true:
assert NoCommentWithoutPost {
all c: Comment | some p: Post | c in p.comments
}
check NoCommentWithoutPost for 10
assert FollowingIsAcyclic {
no u: User | u in u. #
}
check FollowingIsAcyclic for 8
Deliverable for Testing:
- Passing assertions validate your model’s consistency
- Failed assertions (counterexamples) become critical test cases
1.6 Find Boundary Conditions
Systematically explore different scopes to find boundaries:
// Find minimum configuration that exhibits interesting behavior
run MinimalNetwork {
some u: User | #u.following > 0
} for 2
// Find stress scenarios
run StressTest {
all u: User | #u.posts > 3
} for 10
Deliverable for Testing: Boundary instances become edge case tests.
1.7 Model Temporal Properties (Alloy 6+)
For systems with state evolution:
pred init[s: State] {
no s.posts
no s.comments
}
pred stutter[s, s': State] {
s.posts = s'.posts
s.comments = s'.comments
}
fact Traces {
init[first]
all s: State - last |
addPost[s, s.next] or
addComment[s, s.next] or
stutter[s, s.next]
}
assert EventuallyActiveDiscussion {
always (some p: Post | eventually #p.comments > 5)
}
Deliverable for Testing: Temporal properties generate sequence-based integration tests.
1.8 The Complete Model Checklist
Before handoff to the programmer:
- All
check
commands pass (no counterexamples found) - All
run
commands produce meaningful instances - Edge cases explored at minimum and maximum scopes
- Predicates cover all major operations
- Facts capture all critical invariants
- Document why each constraint exists (use comments liberally)
Part 2: The Programmer’s Guide (The Builder)
Your role is to transform the model’s logical guarantees into executable tests, then write code that passes them. The model is your test oracle—it tells you not just what to test, but what the expected behavior should be.
2.1 Setting Up the Test Architecture
Create a test structure that mirrors the model:
# test_structure.py
class ModelBasedTest:
"""Base class for tests derived from Alloy model"""
@classmethod
def from_alloy_instance(cls, instance_xml):
"""Generate test data from Alloy XML output"""
# Parse the instance and create test objects
pass
def assert_invariant(self, fact_name):
"""Verify a fact holds in current state"""
pass
2.2 Mapping Signatures to Test Classes
➡️ From each sig
in the model:
sig Post {
author: one User,
comments: set Comment
}
Generate class and property tests:
class TestPost(unittest.TestCase):
def test_post_must_have_exactly_one_author(self):
"""From: 'author: one User'"""
post = Post()
# Test 'one' multiplicity
with self.assertRaises(ValidationError):
post.save() # No author
post.author = user1
post.save() # Should succeed
with self.assertRaises(ValidationError):
post.author = [user1, user2] # Multiple authors
def test_post_can_have_multiple_comments(self):
"""From: 'comments: set Comment'"""
post = Post(author=user)
comment1 = Comment(content="First")
comment2 = Comment(content="Second")
post.add_comment(comment1)
post.add_comment(comment2)
self.assertEqual(len(post.comments), 2)
2.3 Mapping Facts to Invariant Tests
➡️ From each fact
in the model:
fact NoOrphanedComments {
all c: Comment | some p: Post | c in p.comments
}
Generate violation tests:
class TestInvariants(unittest.TestCase):
def test_cannot_create_orphaned_comment(self):
"""From fact: NoOrphanedComments"""
comment = Comment(author=user, content="Orphan")
# Attempt to save comment without associating with post
with self.assertRaises(InvariantViolation) as ctx:
comment.save()
self.assertEqual(ctx.exception.message,
"Comment must belong to a post")
def test_cannot_follow_self(self):
"""From fact: NoSelfFollowing"""
user = User(name="Alice")
with self.assertRaises(InvariantViolation):
user.follow(user)
2.4 Mapping Run Commands to Happy Path Tests
➡️ From each run
instance:
// Instance generated by: run ShowActiveDiscussion for 5
// User0.following = {User1, User2}
// Post0.author = User0
// Post0.comments = {Comment0, Comment1, Comment2}
Generate concrete scenario tests:
class TestScenarios(unittest.TestCase):
def test_active_discussion_scenario(self):
"""From run: ShowActiveDiscussion"""
# Setup exact state from Alloy instance
user0 = User(name="User0")
user1 = User(name="User1")
user2 = User(name="User2")
user0.follow(user1)
user0.follow(user2)
post = Post(author=user0)
comments = [
Comment(author=user1, content="Comment0"),
Comment(author=user2, content="Comment1"),
Comment(author=user1, content="Comment2")
]
for comment in comments:
post.add_comment(comment)
# Verify the state is valid and behaves correctly
self.assertEqual(len(post.comments), 3)
self.assertEqual(len(user0.following), 2)
self.assertTrue(post.is_active_discussion())
2.5 Mapping Predicates to Operation Tests
➡️ From each pred
in the model:
pred addComment[pre, post: State, u: User, p: Post, c: Comment] {
p in pre.posts
c not in pre.allComments
post.allComments = pre.allComments + c
c.author' = u
}
Generate parameterized operation tests:
class TestOperations(unittest.TestCase):
@parameterized.expand([
# Generate test cases from Alloy pred instances
("regular_user", "public_post", "text_comment"),
("author_self_comment", "own_post", "reply_comment"),
("follower_comment", "followed_user_post", "emoji_comment"),
])
def test_add_comment_operation(self, user_type, post_type, comment_type):
"""From pred: addComment"""
# Setup preconditions
user = self.create_user(user_type)
post = self.create_post(post_type)
comment = self.create_comment(comment_type)
# Verify preconditions from predicate
self.assertIn(post, system.posts)
self.assertNotIn(comment, system.all_comments)
# Execute operation
result = post.add_comment(user, comment)
# Verify postconditions from predicate
self.assertIn(comment, system.all_comments)
self.assertEqual(comment.author, user)
self.assertIn(comment, post.comments)
2.6 Mapping Assertions to Property Tests
➡️ From each counterexample found:
// Counterexample found for: assert NoCycles
// User0.following = User1
// User1.following = User2
// User2.following = User0 -- Cycle!
Generate regression tests:
class TestProperties(unittest.TestCase):
def test_prevent_following_cycles(self):
"""From counterexample: NoCycles assertion"""
user0 = User("Alice")
user1 = User("Bob")
user2 = User("Charlie")
user0.follow(user1)
user1.follow(user2)
# This would create a cycle
with self.assertRaises(CycleDetectedError):
user2.follow(user0)
2.7 Boundary and Edge Case Tests
➡️ From scope analysis:
class TestBoundaries(unittest.TestCase):
def test_minimum_valid_configuration(self):
"""From: run for 2 (minimum scope that works)"""
# Test with exactly 2 users, 1 post, 1 comment
users = [User(f"User{i}") for i in range(2)]
post = Post(author=users[0])
comment = Comment(author=users[1])
post.add_comment(comment)
# System should function with minimal objects
self.assertTrue(system.is_valid())
def test_maximum_stress_scenario(self):
"""From: check for 10 (maximum tested scope)"""
# Create maximum configuration that Alloy verified
users = [User(f"User{i}") for i in range(10)]
# Each user creates maximum posts
for user in users:
for i in range(10):
post = Post(author=user)
# Add maximum comments
for commenter in users:
post.add_comment(Comment(author=commenter))
# System should handle maximum verified load
self.assertTrue(system.is_valid())
self.assertLess(system.response_time(), 1.0) # Performance boundary
2.8 Temporal Property Tests
➡️ From temporal assertions:
class TestTemporalProperties(unittest.TestCase):
def test_eventually_active_discussion(self):
"""From: assert EventuallyActiveDiscussion"""
system = System()
system.init()
# Simulate time progression
for tick in range(100):
system.tick()
# Check if we've reached active discussion state
if any(len(p.comments) > 5 for p in system.posts):
break
else:
self.fail("System never reached active discussion state")
def test_invariant_holds_across_all_transitions(self):
"""From: always (no orphaned comments)"""
system = System()
for _ in range(1000): # Random transitions
operation = random.choice([
system.add_post,
system.add_comment,
system.delete_post,
system.delete_comment
])
try:
operation(*self.random_args())
except InvariantViolation:
pass # Expected for invalid operations
# Invariant must hold after every transition
self.assertTrue(system.check_no_orphaned_comments())
2.9 Test Oracle Extraction
Use Alloy’s evaluator to derive expected values:
class TestOracle:
def extract_expectations_from_instance(self, alloy_instance):
"""
Use Alloy evaluator to compute expected values
Example: In Alloy evaluator, run:
#Post.comments
User.following
all p: Post | p.author in p.comments.author
"""
expectations = {
'total_comments': alloy_eval("#Post.comments"),
'following_graph': alloy_eval("User.following"),
'authors_comment_own': alloy_eval("all p: Post | p.author in p.comments.author")
}
return expectations
2.10 Coverage Metrics
Track test coverage against the model:
class ModelCoverageReport:
def generate_report(self):
return {
'signatures': {
'total': count_sigs_in_model(),
'tested': count_sigs_with_tests(),
'coverage': percentage()
},
'facts': {
'total': count_facts_in_model(),
'tested': count_facts_with_violation_tests(),
'coverage': percentage()
},
'predicates': {
'total': count_predicates_in_model(),
'tested': count_predicates_with_tests(),
'coverage': percentage()
},
'assertions': {
'total': count_assertions_checked(),
'counterexamples_tested': count_counterexample_tests(),
'coverage': percentage()
},
'boundaries': {
'min_scope_tested': True/False,
'max_scope_tested': True/False,
'edge_cases': count_edge_case_tests()
}
}
Part 3: Advanced Techniques
3.1 Mutation Testing Alignment
Every Alloy assertion that passes represents a mutation that would break your system:
class MutationTest(unittest.TestCase):
def test_mutations_from_alloy_assertions(self):
"""Each passing assertion = a mutation to catch"""
mutations = [
lambda: allow_self_following(), # Violates NoSelfFollowing
lambda: allow_orphaned_comments(), # Violates NoOrphanedComments
lambda: allow_cycles(), # Violates NoCycles
]
for mutation in mutations:
with self.assertRaises(TestFailure):
apply_mutation(mutation)
run_test_suite()
3.2 Performance Test Generation
Use Alloy’s scope limits to design performance tests:
def test_performance_at_alloy_limits(self):
"""If Alloy handles scope N, so should our system"""
max_scope = 10 # Maximum scope Alloy could verify
# Generate load matching Alloy's maximum scope
create_users(max_scope)
create_posts(max_scope * max_scope)
create_comments(max_scope ** 3)
start_time = time.time()
system.validate_all_invariants()
duration = time.time() - start_time
self.assertLess(duration, 5.0) # Should complete in reasonable time
3.3 Debugging Failed Constraints
When implementation can’t satisfy a model constraint:
def debug_constraint_failure(self, constraint_name):
"""Understand why a constraint fails"""
# 1. Temporarily remove constraint from model
# 2. Run check to see what breaks
# 3. The counterexample shows why constraint was needed
counterexample = alloy_check_without_constraint(constraint_name)
if counterexample:
print(f"Constraint {constraint_name} prevents: {counterexample}")
# Either fix implementation or refine model
3.4 Incremental Refinement Process
class ModelRefinement:
def refine_based_on_implementation_feedback(self):
"""
When implementation reveals model is too restrictive:
1. Identify the problematic constraint
2. Understand its purpose via counterexamples
3. Refine to be less restrictive while maintaining safety
4. Regenerate tests
5. Verify implementation against refined model
"""
pass
Part 4: Common Patterns and Pitfalls
4.1 Pattern: State Machine Testing
abstract sig State {}
one sig Init, Active, Terminated extends State {}
sig Process {
var state: one State
}
pred transition[p: Process, from, to: State] {
p.state = from
p.state' = to
}
Maps to state transition tests covering all valid paths.
4.2 Pattern: Access Control Testing
pred canAccess[u: User, r: Resource] {
r in u.owned or
r in u.group.resources or
r.public = True
}
Generates permission matrix tests for all user-resource combinations.
4.3 Pitfall: Over-Constraining
Problem: Model too restrictive, implementation can’t satisfy it. Solution: Start with minimal constraints, add incrementally.
4.4 Pitfall: Under-Testing Boundaries
Problem: Only testing within comfortable scopes. Solution: Always test at scope 2 (minimum) and maximum viable scope.
Part 5: Quick Reference
Model → Test Mapping
Alloy Construct | Test Type | Test Purpose |
---|---|---|
sig + fields |
Property tests | Validate structure and multiplicities |
fact |
Invariant violation tests | Ensure rules cannot be broken |
pred |
Operation tests | Test state transitions |
run instance |
Happy path tests | Verify valid scenarios work |
check counterexample |
Regression tests | Prevent specific bugs |
assert (passing) |
Property tests | Verify desired properties hold |
Scope boundaries | Edge case tests | Test at limits |
Temporal properties | Sequence tests | Test behavior over time |
Test Generation Checklist
- Every sig has a test class
- Every field has a multiplicity test
- Every fact has a violation test
- Every successful run has a scenario test
- Every predicate has parameterized tests
- Every found counterexample has a regression test
- Minimum scope (2) tested
- Maximum scope tested
- Performance tested at maximum scope
- Temporal properties have sequence tests
- Test oracle extracted from Alloy evaluator
- Coverage report shows >90% model coverage
Conclusion
This approach transforms Alloy from a design verification tool into a comprehensive test generation engine. The model becomes a living specification that not only proves your design is correct but also generates the tests to ensure your implementation matches that design.
Remember: The bugs Alloy finds in your model are the bugs your tests will prevent in your code.